Correction Approach to Word Segmentation
نویسندگان
چکیده
A number of word segmentation algorithms have been offered in the past; however, there is still room for improvement. Co-occurrence-Based Error Correction (CBEC), the proposed approach in this chapter, is a novel Thai word segmentation approach that was designed to provide accurate segmentation results based on context and purpose. CBEC quickly segments the input string using any available algorithm; maximal matching was used in the experiment. Next, CBEC checks its segmentation output against an error risk data bank to determine if there is any error risk. The error risk data bank is developed based on a training corpus. The current version of the error risk bank was based on the training corpus available at BEST 2009. Then, CBEC re-segments the input string using the co-occurrence score of the word sequence to ensure the accuracy of the segmentation result. DOI: 10.4018/978-1-61350-447-5.ch023
منابع مشابه
Co-Occurrence-Based Error Correction Approach to Word Segmentation
To overcome the problems in Thai word segmentation, a number of word segmentation has been proposed during the long period of time until today. We propose a novel Thai word segmentation approach so called Co-occurrence-Based Error Correction (CBEC). CBEC generates all possible segmentation candidates using the classical maximal matching algorithm and then selects the most accurate segmentation ...
متن کاملComparison of state-of-the-art atlas-based bone segmentation approaches from brain MR images for MR-only radiation planning and PET/MR attenuation correction
Introduction: Magnetic Resonance (MR) imaging has emerged as a valuable tool in radiation treatment (RT) planning as well as Positron Emission Tomography (PET) imaging owing to its superior soft-tissue contrast. Due to the fact that there is no direct transformation from voxel intensity in MR images into electron density, itchr('39')s crucial to generate a pseudo-CT (Computed Tomography) image ...
متن کاملNon-Deterministic Segmentation for Chinese Lattice Parsing
Parsing Chinese critically depends on correct word segmentation for the parser since incorrect segmentation inevitably causes incorrect parses. We investigate a pipeline approach to segmentation and parsing using word lattices as parser input. We compare CRF-based and lexicon-based approaches to word segmentation. Our results show that the lattice parser is capable of selecting the correction s...
متن کاملWord segmentation in Persian continuous speech using F0 contour
Word segmentation in continuous speech is a complex cognitive process. Previous research on spoken word segmentation has revealed that in fixed-stress languages, listeners use acoustic cues to stress to de-segment speech into words. It has been further assumed that stress in non-final or non-initial position hinders the demarcative function of this prosodic factor. In Persian, stress is retract...
متن کاملHandwritten ZIP code recognition using lexicon free word recognition algorithm
This paper describes a new approach to ZIP code recognition using a word recognition algorithm, where a numeral string is recognized as a word. This paper also describes an end to end ZIP code recognition system consisting of tiltlslant correction, line segmentation, word segmentation, ZIP code location, as well as the ZIP code recognition. Evaluation tests are performed using address block ima...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016